NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Practical and Powerful Kernel-Based Change-Point Detection

https://doi.org/10.1109/TSP.2024.3479274

Song, Hoseung; Chen, Hao (October 2024, IEEE Transactions on Signal Processing)

Full Text Available
Generalized kernel two-sample tests

https://doi.org/10.1093/biomet/asad068

Song, Hoseung; Chen, Hao (November 2023, Biometrika)

Summary Kernel two-sample tests have been widely used for multivariate data to test equality of distributions. However, existing tests based on mapping distributions into a reproducing kernel Hilbert space mainly target specific alternatives and do not work well for some scenarios when the dimension of the data is moderate to high due to the curse of dimensionality. We propose a new test statistic that makes use of a common pattern under moderate and high dimensions and achieves substantial power improvements over existing kernel two-sample tests for a wide range of alternatives. We also propose alternative testing procedures that maintain high power with low computational cost, offering easy off-the-shelf tools for large datasets. The new approaches are compared to other state-of-the-art tests under various settings and show good performance. We showcase the new approaches through two applications: the comparison of musks and nonmusks using the shape of molecules, and the comparison of taxi trips starting from John F. Kennedy airport in consecutive months. All proposed methods are implemented in an R package kerTests.
more » « less
Asymptotic distribution-free changepoint detection for data with repeated observations

https://doi.org/10.1093/biomet/asab048

Song, Hoseung; Chen, Hao (September 2021, Biometrika)

Summary A nonparametric framework for changepoint detection, based on scan statistics utilizing graphs that represent similarities among observations, is gaining attention owing to its flexibility and good performance for high-dimensional and non-Euclidean data sequences. However, this graph-based framework faces challenges when there are repeated observations in the sequence, which is often the case for discrete data such as network data. In this article we extend the graph-based framework to solve this problem by averaging or taking the union of all possible optimal graphs resulting from repeated observations. We consider both the single-changepoint alternative and the changed-interval alternative, and derive analytical formulas to control the Type I error for the new methods, making them readily applicable to large datasets. The extended methods are illustrated on an application in detecting changes in a sequence of dynamic networks over time. All proposed methods are implemented in an $$\texttt{R}$$ package $$\texttt{gSeg}$$ available on CRAN.
more » « less
Full Text Available

Search for: All records